Search for: All records

Creators/Authors contains: "Trouille, Laura"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Through the Citizen Scientists’ Eyes: Insights into Using Citizen Science with Machine Learning for Effective Identification of Unknown-Unknowns in Big Data

https://doi.org/10.5334/cstp.740

Mantha, Kameswara Bharadwaj; Roberts, Hayley; Fortson, Lucy; Lintott, Chris; Dickinson, Hugh; Keel, William; Sankar, Ramanakumar; Krawczyk, Coleman; Simmons, Brooke; Walmsley, Mike; et al (December 2024, Citizen Science: Theory and Practice)
Fortson, Lucy; Crowston, Kevin; Kloetzer, Laure; Ponti, Marisa (Ed.)
In the era of rapidly growing astronomical data, the gap between data collection and analysis is a significant barrier, especially for teams searching for rare scientific objects. Although machine learning (ML) can quickly parse large data sets, it struggles to robustly identify scientifically interesting objects, a task at which humans excel. Human-in-the-loop (HITL) strategies that combine the strengths of citizen science (CS) and ML offer a promising solution, but first, we need to better understand the relationship between human- and machine-identified samples. In this work, we present a case study from the Galaxy Zoo: Weird & Wonderful project, where volunteers inspected ~200,000 astronomical images—processed by an ML-based anomaly detection model—to identify those with unusual or interesting characteristics. Volunteer-selected images with common astrophysical characteristics had higher consensus, while rarer or more complex ones had lower consensus. This suggests low-consensus choices shouldn’t be dismissed in further explorations. Additionally, volunteers were better at filtering out uninteresting anomalies, such as image artifacts, which the machine struggled with. We also found that a higher ML-generated anomaly score that indicates images’ low-level feature anomalousness was a better predictor of the volunteers’ consensus choice. Combining a locus of high volunteer-consensus images within the ML learnt feature space and anomaly score, we demonstrated a decision boundary that can effectively isolate images with unusual and potentially scientifically interesting characteristics. Using this case study, we lay important guidelines for future research studies looking to adapt and operationalize human-machine collaborative frameworks for efficient anomaly detection in big data.
more » « less
Full Text Available
A New Curriculum Development Model for Improving Undergraduate Students’ Data Literacy and Self-Efficacy in Online Astronomy Classrooms

https://doi.org/10.32374/AEJ.2022.2.1.043ra

Simon, Molly; Prather, Edward; Rosenthal, Isaac; Cassidy, Michael; Hammerman, James; Trouille, Laura (October 2022, Astronomy Education Journal)

There is a critical need for research-based active learning instructional materials for the teaching and learning of STEM in online courses. Every year, hundreds of thousands of undergraduate non-science majors enroll in general education astronomy courses to fulfill their institution’s liberal arts requirements. When designing instructional materials for this population of learners, a central focus must be to help learners become more scientifically and data literate. As such, we developed a new, three-part, curricular model that was used to inform the creation of active-learning instructional materials designed for use in online courses. The instructional materials were designed to help introductory astronomy students engage meaningfully with science while simultaneously improving their data literacy self-efficacy (especially as it pertained to making evidence-based conclusions when presented with a variety of data representations). We conducted a pilot study of these instructional materials at nine different colleges and universities to better understand whether students’ engagement with these materials lead to improved beliefs and self-efficacy. The results of our student survey analysis showed statistically significant changes on survey items that assessed students’ beliefs about science engagement, citizen science, and their data literacy skills. Additionally, we assessed whether faculty who implemented these materials were able to easily incorporate them into existing online astronomy courses. The instructor feedback emphasized that our curriculum development model did successfully inform the creation of easy-to-implement instructional materials, generating the potential for widespread dissemination and use at the undergraduate level.
more » « less
Full Text Available
Gravity Spy: lessons learned and a path forward

https://doi.org/10.1140/epjp/s13360-023-04795-4

Zevin, Michael; Jackson, Corey B; Doctor, Zoheyr; Wu, Yunan; Østerlund, Carsten; Johnson, L Clifton; Berry, Christopher_P L; Crowston, Kevin; Coughlin, Scott B; Kalogera, Vicky; et al (January 2024, The European Physical Journal Plus)

Abstract The Gravity Spy project aims to uncover the origins of glitches, transient bursts of noise that hamper analysis of gravitational-wave data. By using both the work of citizen-science volunteers and machine learning algorithms, the Gravity Spy project enables reliable classification of glitches. Citizen science and machine learning are intrinsically coupled within the Gravity Spy framework, with machine learning classifications providing a rapid first-pass classification of the dataset and enabling tiered volunteer training, and volunteer-based classifications verifying the machine classifications, bolstering the machine learning training set and identifying new morphological classes of glitches. These classifications are now routinely used in studies characterizing the performance of the LIGO gravitational-wave detectors. Providing the volunteers with a training framework that teaches them to classify a wide range of glitches, as well as additional tools to aid their investigations of interesting glitches, empowers them to make discoveries of new classes of glitches. This demonstrates that, when giving suitable support, volunteers can go beyond simple classification tasks to identify new features in data at a level comparable to domain experts. The Gravity Spy project is now providing volunteers with more complicated data that includes auxiliary monitors of the detector to identify the root cause of glitches.
more » « less
Full Text Available
Data quality up to the third observing run of Advanced LIGO: Gravity Spy glitch classifications

https://doi.org/10.1088/1361-6382/acb633

Glanzer, Jane; Banagari, Sharan; Coughlin, Scott; Soni, Siddharth; Zevin, Michael; Berry, Christopher Philip; Patane, Oli; Bahaadini, Sara; Rohani, Neda; Crowston, Kevin; et al (January 2023, Classical and Quantum Gravity)

Understanding the noise in gravitational-wave detectors is central to detecting and interpreting gravitational-wave signals. Glitches are transient, non-Gaussian noise features that can have a range of environmental and instrumental origins. The Gravity Spy project uses a machine-learning algorithm to classify glitches based upon their time–frequency morphology. The resulting set of classified glitches can be used as input to detector-characterisation investigations of how to mitigate glitches, or data-analysis studies of how to ameliorate the impact of glitches. Here we present the results of the Gravity Spy analysis of data up to the end of the third observing run of Advanced LIGO. We classify 233981 glitches from LIGO Hanford and 379805 glitches from LIGO Livingston into morphological classes. We find that the distribution of glitches differs between the two LIGO sites. This highlights the potential need for studies of data quality to be individually tailored to each gravitational-wave observatory.
more » « less
Full Text Available
Gravity Spy Volunteer Classifications of LIGO Glitches

https://doi.org/10.5281/zenodo.13904421

Crowston, Kevin; Allen, Sara; Bahaadini, Sara; Banagiri, Sharan; Berry, Christopher_P L; Charlton, Emma; Chase, Eve; Corieri, Isabella; Coughlin, Scott B; Crowston, Kevin; et al (January 2024, Zenodo)

This data set contains the individual classifications that the Gravity Spy citizen science volunteers made for glitches through 20 July 2024. Classifications made by science team members or in testing workflows have been removed as have classifications of glitches lacking a Gravity Spy identifier. See Zevin et al. (2017) for an explanation of the citizen science task and classification interface. Data about glitches with machine-learning labels are provided in an earlier data release (Glanzer et al., 2021). Final classifications combining ML and volunteer classifications are provided in Zevin et al. (2022). 22 of the classification labels match the labels used in the earlier data release, namely 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line and Whistle. One glitch class that was added to the machine-learning classification has not been added to the Zooniverse project and so does not appear in this file, namely Blip_Low_Frequency. Four classes were added to the citizen science platform but not to the machine learning model and so have only volunteer labels, namely 70HZLINE, HIGHFREQUENCYBURST, LOWFREQUENCYBLIP and PIZZICATO. The glitch class Fast_Scattering added to the machine-learning classification has an equivalent volunteer label CROWN, which is used here (Soni et al. 2021). Glitches are presented to volunteers in a succession of workflows. Workflows include glitches classified by a machine learning classifier as being likely to be in a subset of classes and offer the option to classify only those classes plus None_of_the_Above. Each level includes the classes available in lower levels. The top level does not add new classification options but includes all glitches, including those for which the machine learning model is uncertain of the class. As the classes available to the volunteers change depending on the workflow, a glitch might be classified as None_of_the_Above in a lower workflow and subsequently as a different class in a higher workflow. Workflows and available classes are shown in the table below. Workflow ID Name Number of glitch classes Glitches added 1610 Level 1 3 Blip, Whistle, None_of_the_Above 1934 Level 2 6 Koi_Fish, Power_Line, Violin_Mode 1935 Level 3 10 Chirp, Low_Frequency_Burst, No_Glitch, Scattered_Light 2360 Original level 4 22 1080Lines, 1400Ripples, Air_Compressor, Extremely_Loud, Helix, Light_Modulation, Low_Frequency_Lines, Paired_Doves, Repeating_Blips, Scratchy, Tomte, Wandering_Line 7765 New level 4 15 1080Lines, Extremely_Loud, Low_Frequency_Lines, Repeating_Blips, Scratchy 2117 Original level 5 22 No new glitch classes 7766 New level 5 27 1400Ripples, Air_Compressor, Paired_Doves, Tomte, Wandering_Line, 70HZLINE, CROWN, HIGHFREQUENCYBURST, LOWFREQUENCYBLIP, PIZZICATO 7767 Level 6 27 No new glitch classes Description of data fields Classification_id: a unique identifier for the classification. A volunteer may choose multiple classes for a glitch when classifying, in which case there will be multiple rows with the same classification_id. Subject_id: a unique identifier for the glitch being classified. This field can be used to join the classification to data about the glitch from the prior data release. User_hash: an anonymized identifier for the user making the classification or for anonymous users an identifier that can be used to track the user within a session but which may not persist across sessions. Anonymous_user: True if the classification was made by a non-logged in user. Workflow: The Gravity Spy workflow in which the classification was made. Workflow_version: The version of the workflow. Timestamp: Timestamp for the classification. Classification: Glitch class selected by the volunteer. Related datasets For machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo For classifications of glitches combining machine learning and volunteer classifications, please see Gravity Spy Volunteer Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b. For the training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo. For detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo.
more » « less
U!Scientist: Designing for People-Powered Research in Museums

https://doi.org/10.1145/3411764.3445334

Obiorah, Mmachi God'sglory; Hammerman, James K.; Rother, Becky; Granger, Will; West, Haley Margaret; Horn, Michael; Trouille, Laura (May 2021, CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems)
null (Ed.)
Scientists have long sought to engage public audiences in research through citizen science projects such as biological surveys or distributed data collection. Recent online platforms have expanded the scope of what people-powered research can mean. Science museums are unique cultural institutions that translate scientific discovery for public audiences, while conducting research of their own. This makes museums compelling sites for engaging audiences directly in scientific research, but there are associated challenges as well. This project engages public audiences in contributing to real research as part of their visit to a museum. We present the design and evaluation of U!Scientist, an interactive multi-person tabletop exhibit based on the online Zooniverse project, Galaxy Zoo. We installed U!Scientist in a planetarium and collected video, computer logs, naturalistic observations, and surveys with visitors. Our findings demonstrate the potential of exhibits to engage new audiences in collaborative scientific discussions as part of people-powered research.
more » « less
Full Text Available
Gravity Spy Volunteer Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b

https://doi.org/10.5281/zenodo.5911226

Zevin, Michael; Coughlin, Scott; Chase, Eve; Allen, Sara; Bahaadini, Sara; Berry, Christopher; Crowston, Kevin; Harandi, Mabi; Jackson, Corey; Kalogera, Vicky; et al (January 2022, Zenodo)

{"Abstract":["This dataset contains machine learning and volunteer classifications from the Gravity Spy project. It includes glitches from observing runs O1, O2, O3a and O3b that received at least one classification from a registered volunteer in the project. It also indicates glitches that are nominally retired from the project using our default set of retirement parameters, which are described below. See more details in the Gravity Spy Methods paper. <\/p>\n\nWhen a particular subject in a citizen science project (in this case, glitches from the LIGO datastream) is deemed to be classified sufficiently it is "retired" from the project. For the Gravity Spy project, retirement depends on a combination of both volunteer and machine learning classifications, and a number of parameterizations affect how quickly glitches get retired. For this dataset, we use a default set of retirement parameters, the most important of which are: <\/p>\n\nA glitches must be classified by at least 2 registered volunteers<\/li>Based on both the initial machine learning classification and volunteer classifications, the glitch has more than a 90% probability of residing in a particular class<\/li>Each volunteer classification (weighted by that volunteer's confusion matrix) contains a weight equal to the initial machine learning score when determining the final probability<\/li><\/ol>\n\nThe choice of these and other parameterization will affect the accuracy of the retired dataset as well as the number of glitches that are retired, and will be explored in detail in an upcoming publication (Zevin et al. in prep). <\/p>\n\nThe dataset can be read in using e.g. Pandas: \n```\nimport pandas as pd\ndataset = pd.read_hdf('retired_fulldata_min2_max50_ret0p9.hdf5', key='image_db')\n```\nEach row in the dataframe contains information about a particular glitch in the Gravity Spy dataset. <\/p>\n\nDescription of series in dataframe<\/strong><\/p>\n\n['1080Lines', '1400Ripples', 'Air_Compressor', 'Blip', 'Chirp', 'Extremely_Loud', 'Helix', 'Koi_Fish', 'Light_Modulation', 'Low_Frequency_Burst', 'Low_Frequency_Lines', 'No_Glitch', 'None_of_the_Above', 'Paired_Doves', 'Power_Line', 'Repeating_Blips', 'Scattered_Light', 'Scratchy', 'Tomte', 'Violin_Mode', 'Wandering_Line', 'Whistle']\n\tMachine learning scores for each glitch class in the trained model, which for a particular glitch will sum to unity<\/li><\/ul>\n\t<\/li>['ml_confidence', 'ml_label']\n\tHighest machine learning confidence score across all classes for a particular glitch, and the class associated with this score<\/li><\/ul>\n\t<\/li>['gravityspy_id', 'id']\n\tUnique identified for each glitch on the Zooniverse platform ('gravityspy_id') and in the Gravity Spy project ('id'), which can be used to link a particular glitch to the full Gravity Spy dataset (which contains GPS times among many other descriptors)<\/li><\/ul>\n\t<\/li>['retired']\n\tMarks whether the glitch is retired using our default set of retirement parameters (1=retired, 0=not retired)<\/li><\/ul>\n\t<\/li>['Nclassifications']\n\tThe total number of classifications performed by registered volunteers on this glitch<\/li><\/ul>\n\t<\/li>['final_score', 'final_label']\n\tThe final score (weighted combination of machine learning and volunteer classifications) and the most probable type of glitch<\/li><\/ul>\n\t<\/li>['tracks']\n\tArray of classification weights that were added to each glitch category due to each volunteer's classification<\/li><\/ul>\n\t<\/li><\/ul>\n\n <\/p>\n\n```\nFor machine learning classifications on all glitches in O1, O2, O3a, and O3b, please see Gravity Spy Machine Learning Classifications on Zenodo<\/p>\n\nFor the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.<\/p>\n\nFor detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. <\/p>"]}
more » « less
Citizen science frontiers: Efficiency, engagement, and serendipitous discovery with human–machine systems

https://doi.org/10.1073/pnas.1807190116

Trouille, Laura; Lintott, Chris J.; Fortson, Lucy F. (February 2019, Proceedings of the National Academy of Sciences)

Citizen science has proved to be a unique and effective tool in helping science and society cope with the ever-growing data rates and volumes that characterize the modern research landscape. It also serves a critical role in engaging the public with research in a direct, authentic fashion and by doing so promotes a better understanding of the processes of science. To take full advantage of the onslaught of data being experienced across the disciplines, it is essential that citizen science platforms leverage the complementary strengths of humans and machines. This Perspectives piece explores the issues encountered in designing human–machine systems optimized for both efficiency and volunteer engagement, while striving to safeguard and encourage opportunities for serendipitous discovery. We discuss case studies from Zooniverse, a large online citizen science platform, and show that combining human and machine classifications can efficiently produce results superior to those of either one alone and how smart task allocation can lead to further efficiencies in the system. While these examples make clear the promise of human–machine integration within an online citizen science system, we then explore in detail how system design choices can inadvertently lower volunteer engagement, create exclusionary practices, and reduce opportunity for serendipitous discovery. Throughout we investigate the tensions that arise when designing a human–machine system serving the dual goals of carrying out research in the most efficient manner possible while empowering a broad community to authentically engage in this research.
more » « less
Full Text Available
Gravity Spy Machine Learning Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b

https://doi.org/10.5281/zenodo.5649211

Coughlin, Scott; Zevin, Michael; Bahaadini, Sara; Rohani, Neda; Allen, Sara; Berry, Christopher; Crowston, Kevin; Harandi, Mabi; Jackson, Corey; Kalogera, Vicky; et al (January 2021, Zenodo)

{"Abstract":["This data set contains all classifications that the Gravity Spy Machine Learning model for LIGO glitches from the first three observing runs (O1, O2 and O3, where O3 is split into O3a and O3b). Gravity Spy classified all noise events identified by the Omicron trigger pipeline in which Omicron identified that the signal-to-noise ratio was above 7.5 and the peak frequency of the noise event was between 10 Hz and 2048 Hz. To classify noise events, Gravity Spy made Omega scans of every glitch consisting of 4 different durations, which helps capture the morphology of noise events that are both short and long in duration.<\/p>\n\nThere are 22 classes used for O1 and O2 data (including No_Glitch and None_of_the_Above), while there are two additional classes used to classify O3 data.<\/p>\n\nFor O1 and O2, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Chirp, Extremely_Loud, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle<\/p>\n\nFor O3, the glitch classes were: 1080Lines, 1400Ripples, Air_Compressor, Blip, Blip_Low_Frequency<\/strong>, Chirp, Extremely_Loud, Fast_Scattering<\/strong>, Helix, Koi_Fish, Light_Modulation, Low_Frequency_Burst, Low_Frequency_Lines, No_Glitch, None_of_the_Above, Paired_Doves, Power_Line, Repeating_Blips, Scattered_Light, Scratchy, Tomte, Violin_Mode, Wandering_Line, Whistle<\/p>\n\nIf you would like to download the Omega scans associated with each glitch, then you can use the gravitational-wave data-analysis tool GWpy. If you would like to use this tool, please install anaconda if you have not already and create a virtual environment using the following command<\/p>\n\n```conda create --name gravityspy-py38 -c conda-forge python=3.8 gwpy pandas psycopg2 sqlalchemy```<\/p>\n\nAfter downloading one of the CSV files for a specific era and interferometer, please run the following Python script if you would like to download the data associated with the metadata in the CSV file. We recommend not trying to download too many images at one time. For example, the script below will read data on Hanford glitches from O2 that were classified by Gravity Spy and filter for only glitches that were labelled as Blips with 90% confidence or higher, and then download the first 4 rows of the filtered table.<\/p>\n\n```<\/p>\n\nfrom gwpy.table import GravitySpyTable<\/p>\n\nH1_O2 = GravitySpyTable.read('H1_O2.csv')<\/p>\n\nH1_O2[(H1_O2["ml_label"] == "Blip") & (H1_O2["ml_confidence"] > 0.9)]<\/p>\n\nH1_O2[0:4].download(nproc=1)<\/p>\n\n```<\/p>\n\nEach of the columns in the CSV files are taken from various different inputs: <\/p>\n\n[\u2018event_time\u2019, \u2018ifo\u2019, \u2018peak_time\u2019, \u2018peak_time_ns\u2019, \u2018start_time\u2019, \u2018start_time_ns\u2019, \u2018duration\u2019, \u2018peak_frequency\u2019, \u2018central_freq\u2019, \u2018bandwidth\u2019, \u2018channel\u2019, \u2018amplitude\u2019, \u2018snr\u2019, \u2018q_value\u2019] contain metadata about the signal from the Omicron pipeline. <\/p>\n\n[\u2018gravityspy_id\u2019] is the unique identifier for each glitch in the dataset. <\/p>\n\n[\u20181400Ripples\u2019, \u20181080Lines\u2019, \u2018Air_Compressor\u2019, \u2018Blip\u2019, \u2018Chirp\u2019, \u2018Extremely_Loud\u2019, \u2018Helix\u2019, \u2018Koi_Fish\u2019, \u2018Light_Modulation\u2019, \u2018Low_Frequency_Burst\u2019, \u2018Low_Frequency_Lines\u2019, \u2018No_Glitch\u2019, \u2018None_of_the_Above\u2019, \u2018Paired_Doves\u2019, \u2018Power_Line\u2019, \u2018Repeating_Blips\u2019, \u2018Scattered_Light\u2019, \u2018Scratchy\u2019, \u2018Tomte\u2019, \u2018Violin_Mode\u2019, \u2018Wandering_Line\u2019, \u2018Whistle\u2019] contain the machine learning confidence for a glitch being in a particular Gravity Spy class (the confidence in all these columns should sum to unity). <\/p>\n\n[\u2018ml_label\u2019, \u2018ml_confidence\u2019] provide the machine-learning predicted label for each glitch, and the machine learning confidence in its classification. <\/p>\n\n[\u2018url1\u2019, \u2018url2\u2019, \u2018url3\u2019, \u2018url4\u2019] are the links to the publicly-available Omega scans for each glitch. \u2018url1\u2019 shows the glitch for a duration of 0.5 seconds, \u2018url2\u2019 for 1 seconds, \u2018url3\u2019 for 2 seconds, and \u2018url4\u2019 for 4 seconds.<\/p>\n\n```<\/p>\n\nFor the most recently uploaded training set used in Gravity Spy machine learning algorithms, please see Gravity Spy Training Set on Zenodo.<\/p>\n\nFor detailed information on the training set used for the original Gravity Spy machine learning paper, please see Machine learning for Gravity Spy: Glitch classification and dataset on Zenodo. <\/p>"]}
more » « less
A transient search using combined human and machine classifications

https://doi.org/10.1093/mnras/stx1812

Wright, Darryl E.; Lintott, Chris J.; Smartt, Stephen J.; Smith, Ken W.; Fortson, Lucy; Trouille, Laura; Allen, Campbell R.; Beck, Melanie; Bouslog, Mark C.; Boyer, Amy; et al (July 2017, Monthly Notices of the Royal Astronomical Society)

Full Text Available